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Investigating  the  Use  of  Color  in  Timeline  Displays 


INTRODUCTION 

The  increasing  use  of  color  in  computer  displays  is 
ubiquitous.  For  example,  automatic  teller  machines,  cell 
phones,  and  handheld  personal  assistant  devices  have  color 
displays,  not  necessarily  because  they  provide  additional 
functionality,  but  because  we  like  the  variation  that  color 
provides.  The  recent  dramatic  increase  of  color  use  in 
air  traffic  control  displays  is  also  a  function  of  consumer 
demand,  because  the  use  of  different  color  categories 
can  convey  critical  information  needed  in  rapid  real¬ 
time  decision  support.  However,  with  advanced  display 
technologies  that  allow  designers  to  use  hundreds  of  color 
conventions  with  no  added  system  cost,  there  is  little,  if 
any,  consideration  of  how  much  color  might  be  too  much 
from  an  information  processing  perspective. 

Xing  and  Schroeder  (2005)  have  documented  the 
extensive  and  inconsistent  color  use  in  air  traffic  control 
(ATC)  displays.  Yuditsky,  Sollenberger,  Della  Rocco, 
Friedman-Berg,  and  Manning  (2002)  discovered  conflict¬ 
ing  results  in  the  use  of  color  in  radar  displays  and  warned 
that  more  investigation  was  needed  to  determine  how 
the  use  of  color  affected  controller  performance.  While 
individual  color  enhancements  seemed  to  provide  benefit, 
when  the  enhancements  were  integrated,  the  beneficial 
effect  was  lost.  Despite  these  findings,  the  Federal  Aviation 
Administration  (FAA)  has  issued  no  formal  requirements 
for  the  use  of  color  in  ATC  displays,  and  consequently 
manufacturers  of  ATC  technologies  are  free  to  develop 
their  own  color  schemes.  Indeed,  even  ATC  facilities 
and  individual  users  are  allowed  to  determine  their  own 
color  preferences  in  some  decision-support  tools.  While 
guidelines  exist  for  the  general  use  of  color  in  ATC  display 
technologies  (Cardosi  &  Hannon,  1 999;  Reynolds,  1 994), 
they  generally  address  optimal  perceptual  conditions  and 
not  how  the  use  of  color  will  improve  or  degrade  task  per¬ 
formance.  Because  of  the  paucity  of  research  on  the  effects 
of  increasing  color  categorizations  on  human  supervisory 
control  performance,  more  specifically  ATC  tasks,  this 
paper  details  the  results  from  an  experiment  designed  to 
evaluate  different  color  categories.  These  categories  were 
used  in  a  timeline  in  an  attempt  to  objectively  measure 
how  the  use  of  color  in  air  traffic  control  displays  affects 
performance. 

The  use  of  color  to  aid  in  information  processing  dates 
back  to  early  WWII  aviation  days  in  which  knobs,  levers, 
and  buttons  in  the  cockpit  were  often  painted  yellow 


to  convey  that  caution  should  be  used  before  activat¬ 
ing  the  control  (such  as  an  emergency  jettison  device). 
Some  devices  were  painted  red  to  remind  pilots  that  they 
should  only  be  activated  in  extreme  cases  (such  as  firing 
a  weapon  or  dropping  a  bomb.)  This  use  of  yellow  and 
red  to  convey  caution  and  warning  information  is  still 
used  today  in  modern  cockpits  and  has  become  a  deeply 
ingrained  heuristic  for  daily  life  as  traffic  lights,  signs,  and 
labels  still  use  this  color  convention. 

Color  in  ATC  displays  is  typically  used  for  three  pri¬ 
mary  task  reasons:  1)  To  draw  attention,  2)  To  identify 
categories  of  information,  and  3)  To  organize  information 
through  color  segmentation  (Xing  &  Schroeder,  2006). 
Research  has  shown  that  color  is  superior  to  achromatic 
visual  attributes  (e.g.,  luminance,  shapes,  and  text)  in 
search  and  organization  tasks  primarily  because  color- 
coded  information  can  be  processed  more  quickly  (Christ, 
1975).  For  example,  the  use  of  red  in  displays  to  convey 
warning  information  allows  operators  such  as  pilots  and 
ATC  personnel  to  quickly  assess  a  problem  state. 

Despite  the  improvements  in  search  and  organization 
tasks  color  can  provide,  previous  research  has  shown  that 
while  subjects  believed  that  color  improved  their  ability 
to  detect  details,  objectively,  color  did  not  improve  target 
detection  or  identification  (Jeffrey  &  Beck,  1972).  In  ad¬ 
dition,  the  use  of  color  can  cause  cognitive  tunneling  or 
“inattentional  blindness,”  in  which  operators  may  miss 
other  important  information  on  a  display  because  they 
fixate  on  the  more  salient  and  compelling  color  change 
(Simons,  2000).  In  addition,  as  the  number  of  displayed 
colors  increases,  along  with  often  dual  or  triple  meaning 
to  the  different  colors,  users’  perceptual  and  cognitive  load 
is  increased,  subsequently  elevating  mental  workload  as 
well  as  increasing  the  likelihood  of  slips  and  errors. 

Despite  the  increasing  use  of  color  in  ATC  displays,  a 
principled  ob  j  ective  evaluation  of  the  impact  of  color  usage 
has  not  been  conducted.  In  general,  subj  ective  evaluations 
from  air  traffic  controllers  have  rated  the  use  of  color 
positively  in  the  context  of  reducing  mental  workload  and 
job  complexity  (Yuditsky  et  al.,  2002).  While  subjective 
evaluations  can  provide  meaningful  feedback,  previous 
research  indicates  that  although  people  like  color  usage 
in  displays  and  think  it  improves  their  performance,  in 
fact,  it  may  not  (Jeffrey  &  Beck,  1 972).  Hence,  this  study 
was  intended  to  provide  objective  measures  of  the  effect 
of  color  in  displays. 
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METHODS 

Apparatus,  Participants,  and  Procedure 

To  objectively  investigate  the  use  of  color  in  an  ATC- 
related  task,  a  human-in-the-loop  simulation  test  bed 
was  programmed  in  MATLAB®.  Since  the  subject  pool 
was  primarily  made  up  of  college  students,  a  simplified 
ATC  task  was  needed  that  contained  realistic  decision 
support  tools,  yet  did  not  require  years  of  expertise  to 
effectively  operate.  Thus,  the  subjects’  task  was  that  of 
a  low-level  surface  manager  of  incoming  and  outgoing 
traffic,  responsible  for  assisting  a  supervisor  in  ensuring 
that  enough  personnel  were  on  hand  for  baggage  handling, 
aircraft  captains,  and  galley  service. 

The  simulation  interface  shown  in  Figure  1  consists  of 
a  plan  view  (map)  radar  display,  two  timelines  (arriving 
and  departing) ,  and  a  datalink  interface  for  displaying  and 
responding  to  questions.  The  radar  screen  represents  the 
local  airspace  that  shows  incoming  and  outgoing  traffic 


in  a  terminal  control  area.  The  circle  in  the  middle  repre¬ 
sents  the  airport  area  in  which  aircraft  are  not  displayed. 
The  timeline  contains  two  essential  elements,  much  like 
what  is  used  in  actual  ATC  timelines,  incoming  (arriv¬ 
ing)  traffic  (left)  and  outbound  (departing)  traffic  (right). 
The  incoming  side  of  the  timeline  represents  the  time 
until  the  expected  aircraft  gate  arrival.  The  outgoing 
side  of  the  timeline  represents  the  time  that  an  aircraft 
begins  loading  passengers  and  baggage  at  the  gate  until 
it  becomes  airborne.  Each  aircraft  tag  contains  the  flight 
number,  number  of  passengers,  number  of  baggage 
items,  assigned  gate  number,  speed  (when  airborne)  and 
altitude  (when  airborne).  The  data  on  the  timeline  and 
the  situation  display  are  dynamically  updated  every  30 
sec,  mimicking  the  information  updating  in  ATC  radar 
displays.  The  datablocks  on  the  situation  display  enter 
to  the  screen  from  random  locations  and  move  in  and 
out  the  screen  at  the  simulated  speed.  In  the  experiment, 
subjects  performed  dual  tasks:  They  monitored  both  the 


Figure  1:  The  simulated  timeline  interface.  The  left  timeline  shows  the  aircraft  due  to  arrive  at  their  gates 
within  the  next  30  minutes,  and  the  right  timeline  shows  those  due  to  take  off  within  the  next  30  minutes.  The 
radar  display  on  the  left  represents  the  terminal  control  airspace  around  the  airport,  with  the  innermost  circle 
representing  the  tower-controlled  airspace.  The  data  link  interface  is  on  the  bottom  of  the  screen  and  is  where 
subjects  record  their  answers. 
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radar  display  and  timeline,  and  they  answered  questions 
from  their  superiors  through  datalink  (text  message) 
communication. 

Training  and  testing  were  conducted  using  a  Dell  Pen¬ 
tium  4  computer  that  had  a  17-inch  color  monitor  with 
a  screen  area  of  1024x768  pixels  and  16-bit  high  color 
resolution.  During  testing,  all  user  responses  were  recorded 
in  separate  files  specific  to  each  subject  and  scenario.  A 
visual  basic  script  was  written  to  score  and  compile  the 
data  into  a  single  spreadsheet  file  for  the  subsequent 
statistical  analysis.  After  signing  required  consent  forms, 
subjects  completed  a  tutorial  that  discussed  the  nature 
of  the  experiment,  explained  the  context  and  use  of  the 
interface,  and  outlined  the  different  color  categories  they 
would  experience  in  a  graphical  format.  Table  1  describes 
the  color  categories  in  tabular  format.  Table  2  details 
the  RGB  vector  of  each  of  the  nine  colors  used.  Subjects 
completed  three  practice  scenarios,  which  exposed  them 
to  all  three  possibilities  of  color  category  (3,  6,  9  colors). 
They  then  began  the  randomly  assigned,  1 8  test  scenarios 
that  lasted  approximately  3.5  minutes  each. 

During  each  scenario,  subj  ects  were  required  to  monitor 
the  radar  display  and  timeline,  and  answer  questions  from 
their  superiors  through  datalink  (text  message)  commu¬ 
nication.  Two  types  of  questions  were  randomly  mixed  in 


each  scenario.  One  type  was  search  questions  (SQ),  such 
as,  “How  many  baggage  items  are  aboard  Delta  768?” 
The  other  type  was  problem-solving  questions  (PSQ) 
such  as,  “How  many  aircraft  will  depart  in  the  next  20 
minutes?”  Appendix  A  lists  all  the  questions  used  in  the 
experiment.  Subjects  were  also  required  to  notify  their 
superior  when  aircraft  of  a  particular  airline  entered  the 
middle  circle  of  the  spatial  display.  This  technique  was 
used  to  evaluate  possible  errors  of  omission  related  to 
increasing  workload. 

Experimental  Design 

The  primary  independent  variable  of  interest  in  this 
experiment  was  the  number  of  colors  used  to  represent 
categorical  information  about  incoming  and  outgoing 
aircraft,  which,  as  depicted  in  Table  1,  were  three,  six,  and 
nine  colors.  Two  secondary  independent  variables  were 
investigated,  number  of  aircraft  (10,  20,  30)  and  arrival 
pattern  (sequential  vs.  non-sequential.)  Thus,  the  statisti¬ 
cal  model  used  was  a  3x3x2  fully  crossed  AN OVA,  and 
the  1 8  scenarios  were  randomly  presented  to  a  total  of  29 
subjects  who  are  college  students  with  normal  vision. 

The  three  levels  of  aircraft  density  were  included  to 
examine  possible  interaction  between  the  number  of 
onscreen  entities  (increasing  workload)  and  the  color 


Table  1 :  Color  Categories  Table  2:  Color  RGB  Vectors 


Flight  Status 

Nui 

3 

Tiber  of  C 

6 

olors 

9 

Scheduled  to  arrive 

White 

En  route 

Blue 

En  route  outside  airspace 

Blue 

En  route  inside  airspace 

Cyan 

On  final  approach 

Orange 

Orange 

Taxiing  in 

Yellow 

Yellow 

Yellow 

On  runway 

Red 

Ready  to  dock  with  gate 

Purple 

At  gate 

Green 

Green 

Green 

Ready  for  pushback 

Pink 

Pink 

Taxiing  out 

Yellow 

Yellow 

Yellow 

In  final  queue  (holding  short) 

Orange 

Orange 

Departed 

White 

White 

White 

Color 

RGB  Vector 

White 

[1  1  1] 

Yellow 

[110] 

Green 

[0  10] 

Blue 

[0  0.5  1] 

Orange 

[1  0.5  0] 

Pink 

[1  0  0.5] 

Cyan 

[Oil] 

Red 

[10  0] 

Violet 

[0.7  0.5  0.9] 
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categories.  In  addition,  arrival  patterns  could  also  affect 
a  controller’s  ability  to  effectively  search  for  informa¬ 
tion.  Aircraft  that  maintain  their  relative  positions  in 
the  timeline  are  easier  to  track  than  aircraft  that  appear 
to  “jump”  on  the  timeline,  e.g.,  aircraft  that  are  put  into 
holding  patterns,  disrupting  the  expected  flow  of  traf¬ 
fic.  Both  the  increasing  number  of  aircraft  and  arrival 
patterns  represent  environmental  complexity  while  the 
color  categories  represent  an  intervention  designed  to 
mitigate  complexity. 

Multiple  dependent  variables  were  used  to  test  the 
effects  of  both  significant  environmental  complexities 
and  complexity  mitigation  strategies.  The  general  strategy 
was  to  measure  performance  using  the  embedded  dat- 
alink  tool,  a  strategy  that  has  been  useful  in  developing 
workload  metrics  in  a  military  command  and  control 
domain  (Cummings  &  Guerlain,  2004).  The  questions 
introduced  through  the  datalink  window  fell  into  two 
categories:  1)  search  questions  that  relied  on  perceptual 
information  processing,  e.g.,  subjects  had  to  locate  a 
single  piece  of  information  such  as  the  call  sign  of  a  cer¬ 
tain  aircraft,  and  2)  problem-solving  questions,  in  which 
subjects  were  required  to  calculate  or  derive  information 
from  multiple  sources,  e.g.,  the  number  of  aircraft  at  their 
gates.  In  each  of  the  18  scenarios,  subjects  were  asked 
six  questions,  which  were  generally  evenly  split  between 
problem-solving  and  search  categories.  The  dependent 
variables  consisted  of  the  time  subjects  took  to  answer 
both  question  types  as  well  as  the  accuracy  of  the  answers. 
In  addition  to  the  requirement  that  subjects  answer  all 
datalink  questions,  they  also  were  required  to  notify  the 
supervisor  when  they  first  noted  that  a  flight  of  a  certain 
carrier  entered  the  outermost  radial  circle  on  the  spatial 


Figure  2:  Search  Question  Response 
Time:  Color  vs.  Aircraft  Number 


display.  Failure  to  recognize  this  situation  resulted  in  an 
error  of  omission,  which  is  the  final  dependent  variable 
to  be  measured. 

RESULTS 

Response  Time 

Response  times  were  measured  as  the  time  between 
the  arrival  of  a  datalink  question  and  entry  of  a  response. 
Answers  were  intended  to  be  very  short,  i.e.,  all  numeric 
answers,  so  as  not  to  confound  answers  with  typing 
ability.  Response  times  to  SQand  PSQ  required  natural 
logarithm  transformations  to  meet  AN OVA  normality 
and  homogeneity  of  variances  assumptions.  For  the  SQ 
response  time,  the  main  effects  of  the  number  of  aircraft 
and  color  categories  were  significant  (both  p  <  .001,  a  = 
.05).  Flowever,  because  there  were  significant  interactions 
for  all  higher  order  terms  involving  color  (p  < .  00 1 ) ,  these 
results  can  only  be  interpreted  by  examining  the  marginal 
means.  Figures  2  and  3  show  plots  of  marginal  means  for 
time  versus  color  category  for  the  different  numbers  of 
aircraft.  Figure  2  illustrates  results  for  search  questions 
and  Figure  3  for  problem-solving  questions. 

Figure  2  indicates  that  as  the  number  of  aircraft  in¬ 
creased,  response  times  increased  for  search  questions, 
which  is  expected  since  there  were  more  entities  to  search 
on  the  radar  display  and  the  timeline.  The  interaction 
was  significant,  and  there  is  no  clear  pattern  that  can  be 
discerned  for  color.  For  1 0  aircraft,  increasing  color  usage 
tended  to  improve  search  time;  however,  for  20  and  30 
aircraft,  increasing  color  usage  did  not  appear  to  either 
help  or  hurt  response  time. 


Figure  3:  Problem-Solving  Response  Time: 
Aircraft  vs.  Color 


For  the  PSQ  response  time,  color  was  not  significant 
but  arrival  pattern  and  the  number  of  aircraft  were  (p  = 
.027  and  p  <  .001,  respectively).  There  were  no  signifi¬ 
cant  interactions.  Figure  3  demonstrates  that  increasing 
the  number  of  aircraft  caused  longer  problem-solving 
time,  regardless  of  color  usage.  While  there  appears  to 
be  a  large  dip  for  20  aircraft  and  6  color  categories,  the 
difference  was  only  one  1  sec.  It  is  likely  an  indication 
that  the  questions  generated  under  this  category  were 
easier  to  answer  than  the  others.  In  future  studies,  greater 
effort  is  needed  to  ensure  parity  of  questions.  Figure  4 
shows  the  plot  of  marginal  means  for  response  time  versus 
arrival  pattern  for  the  different  aircraft  numbers.  This 
graph  demonstrates  that  non-sequential  arrival  patterns 
caused  higher  response  times  than  sequential  patterns. 
That  is  expected  since  controllers  must  rearrange  their 
mental  models  for  the  traffic  picture  when  aircraft  do 
not  arrive  in  a  sequential  fashion  and,  thus,  take  longer 
when  calculating  relevant  information. 


Figure  4:  Problem-Solving  Response  Time: 
Arrival  Pattern  vs.  Aircraft 


Color  Categories 

Figure  5:  Accuracy  for  Search  Questions 


Performance  Accuracy 

While  response  times  can  provide  important  perfor¬ 
mance  metric  information,  it  is  equally  important,  if 
not  more  so,  to  consider  how  a  particular  experimental 
condition  affected  answer  accuracy.  For  this  experiment, 
subjects  were  classified  as  “accurate”  if  they  achieved 
greater  than  2/3  accuracy  for  test  questions  in  a  particular 
scenario.  An  overall  comparison  of  subject  performance 
accuracy  for  the  search  and  problem-solving  questions 
in  each  of  the  18  test  sessions  reveals  intriguing  results. 
Figure  5  shows  plots  of  the  number  of  correct  and  incorrect 
answers  versus  color  category  for  search  questions.  For 
the  search  questions,  additional  color  categories  increased 
inaccuracy,  most  predominantly  from  three  to  six  color 
categories,  however,  the  general  subject  population  had 
a  relatively  high  level  of  accuracy  (Pearson  Chi-Square 
showed  marginal  significance,  p  =  .069). 

Figure  6  shows  the  relationship  between  the  number 
of  correct  and  incorrect  answers  versus  color  category 
for  problem-solving  questions.  For  the  problem-solv¬ 
ing  questions,  the  increase  in  color  categories  actually 
improved  answer  accuracy,  although  there  was  no  dif¬ 
ference  between  the  uses  of  six  or  nine  color  categories 
(Pearson  Chi-Square  test  showed  p  <  .001).  Generally,  for 
both  question  types,  wrong  answers  typically  increased 
with  the  number  of  aircraft.  Measures  of  association  us¬ 
ing  Cramer’s  V  are  reflected  in  Table  3.  For  the  search 
questions,  arrival  patterns  and  number  of  aircraft,  both 
environmental  complexity  factors,  affected  correct  answers 
while  increasing  aircraft  and  color  categories  were  both 
moderately  associated  with  wrong  answers. 

Omission  Errors 

The  incorrect  answers  given  by  the  subj  ects  to  datalink 
queries  represent  errors  of  commission.  We  also  examined 
the  influence  of  the  color  categories  on  omission  error  oc¬ 
currences,  as  well  as  increasing  environmental  complexity 
due  to  increasing  numbers  of  aircraft  and  non-sequential 
arrival  patterns.  Subjects  were  told  that  whenever  an  ar¬ 
riving  aircraft  marked  as  BAW  (British  Airways)  entered 


Figure  6:  Accuracy  for  Problem-solving 
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Table  3:  Accuracy  Measure  of  Association 


Search  Accuracy 

Problem-Solving  Accuracy 

Color  Categories 

.102  (p=.069) 

.240  (p  <  .001) 

Aircraft  Density 

.269  (p  <  .001) 

.275  (p  <  .001) 

Arrival  Pattern 

.327  (p  <  .001) 

Not  significant 

Color  Category 


Figure  7:  Omission  Errors:  Color  Categories 


Figure  8:  Omission  Errors:  Number  of  Aircraft 


the  area  between  the  two  dashed  circles  on  the  radar 
display  in  Figure  1 ,  they  were  to  notify  their  supervisors 
by  clicking  the  “Warn  Ground  Manager”  button  on  the 
lower  right  part  of  the  screen.  A  Wilcoxon  Signed  Rank 
Test  showed  that  the  difference  between  the  two  samples 
(correct  notifications  and  errors  of  omission)  was  signifi¬ 
cant  (p  =  .009),  and  for  all  independent  variables,  correct 
answers  exceeded  errors  of  omission. 

Figure  7  represents  the  number  of  overall  correct 
notifications  as  compared  to  the  errors  of  omission  for 
the  color  categories.  A  Mann-Whitney  test  between  the 
number  of  errors  for  the  six  and  nine  color  categories  was 
significant  (p  =  .00 1) .  Figure  8  demonstrates  asimilar  trend 
of  omission  error  increasing  between  20  and  30  aircraft 
(Mann-Whitney,  p  <  .001).  The  arrival  pattern  effect  is 
seen  in  Figure  9.  Subjects’  errors  of  omission  increased 
when  the  arrival  patterns  were  non-sequential,  and  the 
difference  between  sequential  and  non-sequential  patterns 
was  significant  (Mann  Whitney,  p  =  .036). 

Given  that  all  three  independent  variables  showed 
significance  through  non-parametric  testing,  further 
investigation  was  warranted  to  determine  the  magnitude 
of  the  significant  relationships.  Association  testing  using 
the  Kendall  tau-b  statistic  revealed  significant  associa¬ 
tions  for  all  three  variables,  as  shown  in  Table  4.  As  color 
categories  increased  from  3  to  9,  and  as  aircraft  increased 
from  10  to  30,  errors  of  omission  increased,  as  indicated 
by  the  positive  association.  Subjects  that  experienced 
sequential  arrival  patterns  made  less  omission  errors,  as 
indicated  by  the  negative  association.  All  associations 
were  moderate,  but  the  factor  of  the  number  of  aircraft, 
an  environmental  complexity  factor,  contributed  only 
slightly  more  to  error  rates  than  the  number  of  colors,  a 
factor  designed  to  mitigate  complexity. 


Table  4:  Measures  of  Omission  Error  Association 


Factor 

Association 

Significance 

Color  Category 

.330 

p<.001 

No.  of  Aircraft 

.355 

pc.001 

Arrival  Pattern 

-.293 

II 

O 

o 

Figure  9:  Omission  Errors:  Arrival  Pattern 
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One  problem  with  the  previous  non-parametric 
analyses  was  the  lack  of  consideration  of  interaction  ef¬ 
fects.  Since  the  numbers  of  aircraft  and  color  categories 
contributed  almost  equally  to  error  rates,  a  graph  of  the 
color  categories  for  each  aircraft  level  was  generated  to 
further  investigate  the  nature  of  any  interaction.  Figure 
10  represents  the  number  of  omission  errors  for  each 
color  category  (3,  6,  9),  as  well  as  the  number  of  aircraft 
(10,  20,  30).  The  largest  increase  in  omission  errors 
occurred  for  those  subjects  with  30  aircraft  and  9  color 
categories. 
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Figure  10:  Color  vs.  Number  of  Aircraft 

DISCUSSION 

This  experiment  investigated  how  performance,  as 
measured  by  response  time,  accuracy,  and  omission  errors 
on  asimulatedATC-like  taskwas  affected  by  varying  color 
categories,  the  number  of  aircraft,  and  arrival  patterns. 
Several  important  trends  were  noted. 

Number  of  Aircraft 

The  number  of  aircraft  was  included  because  it  repre¬ 
sents  a  primary  source  of  environmental  complexity  for 
controllers.  Expectedly,  as  the  number  of  aircraft  increased, 
the  response  time  for  both  search  and  problem-solving 
questions  increased.  In  addition,  the  performance  accuracy 
declined,  and  errors  of  omission  increased  as  the  num¬ 
ber  of  aircraft  increased.  Increasing  numbers  of  entities 
for  consideration  is  a  known  significant  component  of 
information  complexity  (Edmonds,  1999)  and  is  cited 
as  a  major  source  of  air  traffic  control  complexity  (see 
Majumdar  &  Ochieng,  2002,  for  a  review).  This  experi¬ 
ment  provides  quantitative  evidence  for  this  theory. 

Arrival  Pattern 

Traffic  flow  is  another  commonly  cited  source  of 
ATC  complexity  (Majumdar  &  Ochieng,  2002).  This 
environmental  complexity  factor  was  represented  in  this 
study  by  arrival  patterns  that  were  sequential  as  opposed 
to  non-sequential.  It  was  hypothesized  that  arrival  pat¬ 
tern  would  be  a  less  significant  factor  across  dependent 
variables,  which  was  the  case.  Arrival  pattern  was  not 


significant  for  search  times  but  was  significant  for  prob¬ 
lem-solving  times.  This  result  is  not  surprising  because 
if  a  person  expects  to  find  information  about  an  aircraft 
in  one  area  but  its  original  sequence  is  disrupted,  a  new 
search  is  initiated,  taking  more  time.  Arrival  pattern  was 
moderately  associated  with  the  accuracy  to  search  ques¬ 
tions,  indicating  that  perhaps  subjects  did  not  recognize 
a  positional  change  on  the  timeline  that  led  to  incorrect 
answers.  Finally,  arrival  pattern  significantly  affected 
subjects’  errors  of  omission,  however,  to  a  less  degree 
than  both  the  number  of  aircraft  and  color  categories. 
Arrival  patterns  did  not  directly  cause  errors  of  omission, 
but  non-sequential  patterns  increased  search  time,  and 
thus  increased  overall  workload,  which  diverted  attention 
from  the  monitoring  task.  We  need  to  point  out  that  the 
simulated  arrival  patterns  are  over-simplified  compared 
with  those  in  real  ATC  operations.  Thus,  it  is  possible 
that  the  relatively  moderate  association  between  the 
arrival  patterns  and  performance  is  because  we  did  not 
capture  the  complicated  nature  of  the  arrival  patterns  in 
real  ATC  operations. 

Color  Categories 

The  number  of  color  categories  was  the  primary  in¬ 
dependent  variable  in  this  study.  While  the  number  of 
aircraft  and  arrival  patterns  are  factors  of  environmental 
complexity  and,  thus,  cannot  be  controlled  in  advance, 
color  categories  can  be  controlled  because  color-coding 
is  a  design  intervention  meant  to  mitigate  complexity 
and  aid  users  in  expeditious  and  safe  handling  of  aircraft. 
Across  several  different  dependent  variables,  this  experi¬ 
ment  suggests  that  using  more  color  categories  (e.g. ,  from 
three  to  six)  provides  no  additional  benefit  in  performance; 
moreover,  using  a  large  number  of  color  categories  (more 
than  six)  can  actually  degrade  performance. 

The  results  of  response  times  to  datalink  questions 
were  mixed.  For  search  questions,  increasing  the  number 
of  color  categories  reduced  response  times  for  ten  aircraft 
but  provided  no  benefit  for  higher  aircraft  densities.  For 
problem-solving  questions,  increasing  color  categories 
provided  no  statistical  improvement  in  response  times.  For 
the  answer  accuracy  measure,  color-coding  significantly 
improved  the  rate  of  correct  answers  from  the  three  to  six 
color  category  but  reached  a  plateau  at  six,  so  nine  color 
categories  provided  no  additional  benefit. 

The  analysis  of  errors  of  omission  in  the  context  of 
color  categories  provides  evidence  that  using  more  than 
six  color  categories  can  introduce  performance  problems. 
When  nine  color  categories  were  represented,  errors  of 
omission  increased  significantly  from  the  three  and  six 
color  categories  that  produced  essentially  the  same  error 
rates.  While  the  number  of  aircraft  exhibited  a  slightly 
stronger  association  for  errors  of  omission,  color  categories 
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were  a  significant  contributor  to  errors  of  omission,  more 
so  than  the  arrival  rate  of  aircraft.  While  increasing  color 
categories  does  not  “cause”  people  to  forget  actions,  it 
does  require  subjects  to  spend  more  time  in  search  and 
mapping  tasks,  so  it  may  take  away  time  from  other  tasks 
and  increase  the  likelihood  that  a  subsequent  or  concur¬ 
rent  task  will  be  forgotten.  This  is  an  important  finding 
because  color-coding  is  a  design  intervention  meant  to 
mitigate  complexity,  not  add  to  it.  In  this  study,  the 
use  of  color-coding  beyond  six  categories,  especially  for 
high  workload  situations,  caused  degraded  performance 
through  increased  errors  of  omission. 

CONCLUSION 

The  use  of  color-coding  in  human  supervisory  control 
displays  is  a  design  intervention  meant  to  mitigate  task 
complexity  and  reduce  mental  workload.  Color  has  been 
shown  to  aid  operators  in  search  and  organization  tasks. 
In  this  study,  the  use  of  a  few  color  categories  significantly 
improved  subjects’  performances  in  answering  search 
and  problem-solving  questions  during  a  monitoring  task 
more  accurately  than  the  use  of  three  color  categories. 
However,  this  experiment  suggests  that  beyond  six  color 
categories,  performance  accuracy  is  not  aided  and  is  pos¬ 
sibly  degraded.  In  addition,  errors  of  omission  significantly 
increased  from  six  to  nine  color  categories,  so  increas¬ 
ing  color  usage  might  prompt  attentional  blindness  or 
cognitive  tunneling. 

Investigation  of  other  environmental  sources  of  com¬ 
plexity  revealed  that  increasing  the  number  of  aircraft  af¬ 
fected  subj  ects’  performances  slightly  more  than  increasing 
color  categories;  however,  varying  traffic  arrival  patterns 
was  not  as  strongly  associated  with  degraded  performance 
as  the  other  two  factors.  This  finding  is  important  be¬ 
cause  the  use  of  color  in  displays  is  meant  to  reduce  task 
complexity,  not  add  to  it.  This  study  demonstrated  that, 
especially  under  high  workloads,  color  categorization 
beyond  six  groupings  resulted  in  more  errors  of  omis¬ 
sion,  even  more  than  an  environmental  complexity  factor 
that  cannot  be  controlled.  These  results  are  in  line  with 
previous  recommendations  (Cardosi  &  Hannon,  1999) 
that  no  more  than  six  colors  should  be  used  in  an  ATC 
display.  However,  further  research  is  needed  to  examine 
the  effects  of  multiple  meanings  for  color  categorizations 
and  the  role  of  context  for  these  categorizations. 
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APPENDIX  A: 

Search  questions  and  problem-solving  questions  used  in  the  experiment. 


Scenario  Coding 

111- 10  aircraft,  3  colors,  sequential 

112- 10  aircraft,  3  colors,  non-sequential 
121-10  aircraft,  6  colors,  sequential 
122  -  10  aircraft,  6  colors,  non-sequential 
131-10  aircraft,  9  colors,  sequential 

1 12  -  10  aircraft,  9  colors,  non-sequential 

211  -  20  aircraft,  3  colors,  sequential 
212-20  aircraft,  3  colors,  non-sequential 

221  -20  aircraft,  6  colors,  sequential 

222  -  20  aircraft,  6  colors,  non-sequential 
231  -20  aircraft,  9  colors,  sequential 
212-20  aircraft,  9  colors,  non-sequential 

311- 30  aircraft,  3  colors,  sequential 

312- 30  aircraft,  3  colors,  non-sequential 

321  -30  aircraft,  6  colors,  sequential 

322  -  30  aircraft,  6  colors,  non-sequential 
331  -30  aircraft,  9  colors,  sequential 
312-30  aircraft,  9  colors,  non-sequential 


Question  Type 

SQ  -  Search  Question:  Answers  to  these  question  types  required  the  search  for  a  single  specific  piece  of  information  on  the 
display. 

PSQ  -  Problem-solving  Question:  This  question  type  required  the  aggregation  of  multiple  pieces  of  information  into  a  single 
result  via  a  count. 


Type 

Question 

111 

PS 

How  many  aircraft  departing  in  the  next  30  minutes  are  at  their  gates? 

111 

PSQ 

How  many  aircraft  currently  in  the  air  are  arriving  in  the  next  20  minutes? 

111 

SQ 

In  how  many  minutes  will  AFR  3203  be  arriving? 

111 

SQ 

How  many  passengers  aboard  DAL  145? 

111 

PSQ 

How  many  flights  departing  in  the  next  20  minutes? 

111 

SQ 

At  what  gate  is  JAL  0091? 

112 

PSQ 

How  many  aircraft  are  arriving  in  the  next  1 5  minutes? 

112 

PSQ 

How  many  arriving  aircraft  are  currently  taxiing? 

112 

PSQ 

How  many  aircraft  still  at  their  gates  are  departing  in  the  next  20  minutes? 

112 

SQ 

In  how  many  minutes  will  SWA  315  be  arriving? 

112 

PSQ 

How  many  flights  currently  in  the  air  are  arriving  in  the  next  25  minutes? 

112 

SQ 

At  what  gate  is  UAL  518? 

121 

PSQ 

How  many  departing  aircraft  are  ready  for  pushback? 

121 

PSQ 

How  many  aircraft  at  their  gates  are  departing  in  the  next  25  minutes? 

121 

PSQ 

How  many  arriving  aircraft  are  on  final  approach? 

121 

SQ 

How  many  passengers  aboard  AAL  1478? 

(Continued) 


A-l 


121 

SQ 

At  what  gate  is  COA  791? 

121 

PSQ 

How  many  flights  currently  in  the  air  (and  not  on  final  approach)  are  arriving  in  the  next  20  minutes? 

122 

PSQ 

How  many  arriving  aircraft  are  on  final  approach? 

122 

PSQ 

How  many  aircraft  at  their  gates  are  departing  in  the  next  30  minutes? 

122 

PSQ 

How  many  aircraft  departing  in  the  next  20  minutes  are  ready  for  pushback? 

122 

SQ 

How  many  baggage  items  aboard  AAL  214? 

122 

PSQ 

How  many  flights  currently  in  the  air  (and  not  on  final  approach)  are  arriving  in  the  next  30  minutes? 

122 

SQ 

At  what  gate  is  AAL  518? 

131 

PSQ 

How  many  arriving  aircraft  are  ready  to  dock  with  the  gate? 

131 

PSQ 

How  many  arriving  aircraft  are  currently  inside  the  airspace? 

131 

SQ 

In  how  many  minutes  will  COA  2020  be  arriving? 

131 

SQ 

How  many  bags  loaded  aboard  UAL  335? 

131 

PSQ 

How  many  arriving  aircraft  are  taxiing  in? 

131 

PSQ 

How  many  departing  flights  are  holding  short  of  a  runway? 

132 

PSQ 

How  many  aircraft,  departing  in  the  next  30  minutes,  are  currently  at  their  gates  but  not  yet  ready  for 
pushback? 

132 

PSQ 

How  many  aircraft  departing  in  the  next  20  minutes  are  ready  for  pushback? 

132 

SQ 

How  many  passengers  incoming  aboard  DAL  325? 

132 

SQ 

In  how  many  minutes  will  LMD  0121  be  departing? 

132 

PSQ 

How  many  taxiing  aircraft  will  be  arriving  at  their  gates  within  the  next  15  minutes? 

132 

PSQ 

How  many  arriving  flights  are  on  final  approach? 

211 

PSQ 

How  many  arriving  aircraft  are  currently  taxiing? 

211 

PSQ 

How  many  aircraft  are  arriving  in  the  next  20  minutes? 

211 

PSQ 

How  many  aircraft  at  their  gates  are  departing  in  the  next  30  minutes? 

211 

SQ 

In  how  many  minutes  will  JBU  3 12  be  arriving? 

211 

SQ 

At  what  gate  is  LAN  4971? 

211 

PSQ 

How  many  flights  currently  in  the  air  are  arriving  in  the  next  20  minutes? 

212 

PSQ 

How  many  aircraft  are  arriving  in  the  next  10  minutes? 

212 

PSQ 

How  many  aircraft  departing  in  the  next  30  minutes  are  currently  taxiing? 

212 

SQ 

In  how  many  minutes  will  TRS  222  be  departing? 

212 

PSQ 

How  many  aircraft  currently  at  their  gates  will  be  departing  in  the  next  20  minutes? 

212 

PSQ 

How  many  flights  currently  in  the  air  are  arriving  in  the  next  15  minutes? 

212 

SQ 

At  what  gate  is  AFR  3213? 

221 

PSQ 

How  many  departing  aircraft  are  holding  short  of  their  runways? 

221 

PSQ 

How  many  departing  aircraft  are  ready  for  pushback? 

221 

PSQ 

How  many  aircraft  arriving  in  the  next  20  minutes  are  taxiing? 

221 

SQ 

How  many  baggage  items  arriving  aboard  UAL  323? 

221 

PSQ 

How  many  flights  currently  in  the  air  (and  not  on  final  approach)  are  arriving  in  the  next  30  minutes? 

221 

SQ 

To  what  gate  is  BAW  733  assigned? 

222 

PSQ 

How  many  departing  aircraft  are  ready  for  pushback? 

222 

SQ 

To  what  gate  is  AAA  199  assigned? 

(Continued) 
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222 

PSQ 

How  many  aircraft  arriving  in  the  next  25  minutes  are  still  in  the  air  (and  not  on  final  approach)? 

222 

PSQ 

How  many  aircraft  arriving  in  the  next  10  minutes  are  currently  taxiing? 

222 

SQ 

How  many  baggage  items  arriving  aboard  BAW  3514? 

222 

PSQ 

How  many  flights  leaving  in  the  next  25  minutes  are  still  at  their  gates  and  not  ready  for  pushback? 

231 

PSQ 

How  many  departing  aircraft  are  on  runways? 

231 

PSQ 

How  many  aircraft  arriving  in  the  next  1 5  minutes  are  currently  taxiing? 

231 

PSQ 

How  many  aircraft  arriving  in  the  next  20  minutes  are  on  final  approach? 

231 

SQ 

At  what  gate  is  BAW  942? 

231 

SQ 

How  many  passengers  are  arriving  aboard  NWA  333? 

231 

PSQ 

How  many  arriving  flights  are  ready  to  dock  with  their  gate? 

232 

SQ 

At  what  gate  is  SNG  5121? 

232 

PSQ 

How  many  aircraft  departing  in  the  next  30  minutes  are  ready  for  pushback? 

232 

SQ 

How  many  baggage  items  are  loaded  aboard  DAL  2294? 

232 

PSQ 

How  many  aircraft  arriving  in  the  next  30  minutes  are  still  outside  the  airspace? 

232 

PSQ 

How  many  aircraft  are  on  final  approach? 

232 

PSQ 

How  many  flights  arriving  in  the  next  10  minutes  are  taxiing  in? 

311 

PSQ 

How  many  aircraft  departing  in  the  next  25  minutes  are  at  their  gates? 

311 

PSQ 

How  many  aircraft  arriving  in  the  next  5  minutes  are  currently  taxiing? 

311 

PSQ 

How  many  aircraft  currently  in  the  air  will  be  arriving  in  the  next  20  minutes? 

311 

SQ 

In  how  many  minutes  will  NWA  1 1 16  be  arriving? 

311 

SQ 

How  many  baggage  items  are  arriving  aboard  BAW  7132? 

311 

SQ 

At  what  gate  is  ASA  963? 

312 

SQ 

At  what  gate  is  DAL  313? 

312 

SQ 

In  how  many  minutes  will  SWA  041  be  departing? 

312 

PSQ 

How  many  aircraft  currently  in  the  air  will  be  arriving  in  the  next  15  minutes? 

312 

PSQ 

How  many  aircraft  arriving  in  the  next  10  minutes  are  currently  taxiing? 

312 

SQ 

How  many  baggage  items  are  arriving  aboard  AFR  9191? 

312 

PSQ 

How  many  aircraft  departing  in  the  next  25  minutes  are  at  their  gates? 

321 

PSQ 

How  many  arriving  aircraft  are  on  final  approach? 

321 

PSQ 

How  many  aircraft  at  their  gates  are  departing  in  the  next  25  minutes? 

321 

SQ 

How  many  baggage  items  arriving  aboard  DAL  393? 

321 

SQ 

In  how  many  minutes  will  HAL  421  be  departing? 

321 

PSQ 

How  many  flights  currently  in  the  air  (and  not  on  final  approach)  are  arriving  in  the  next  25  minutes? 

321 

SQ 

At  what  gate  is  DAL  1993? 

322 

SQ 

How  many  baggage  items  arriving  aboard  WAW  332? 

322 

PSQ 

How  many  aircraft  arriving  in  the  next  30  minutes  are  on  final  approach? 

322 

SQ 

In  how  many  minutes  will  NER  042  be  arriving? 

322 

PSQ 

How  many  arriving  aircraft  that  are  currently  taxiing,  will  be  arriving  in  the  next  15  minutes? 

322 

SQ 

At  what  gate  is  ARL  238? 

322 

PSQ 

How  many  flights  currently  in  the  air  (and  not  on  final  approach)  are  arriving  in  the  next  30  minutes? 

(Continued) 
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331 

331 

331 

331 

331 

331 

PSQ 

SQ 

PSQ 

PSQ 

PSQ 

PSQ 

How  many  departing  aircraft  are  ready  for  pushback? 

How  many  bags  loaded  aboard  HAL  103? 

How  many  aircraft  arriving  in  the  next  30  minutes  are  currently  on  a  runway? 

How  many  arriving  aircraft  are  currently  inside  the  airspace  but  not  on  final  approach? 

How  many  departing  flights  are  holding  short  of  a  runway? 

How  many  aircraft  departing  in  the  next  30  minutes  are  still  at  their  gates  but  not  yet  ready  for  pushback? 

332 

PSQ 

How  many  arriving  aircraft  are  currently  inside  the  airspace  but  not  yet  on  final  approach? 

332 

PSQ 

How  many  departing  aircraft  are  ready  for  pushback? 

332 

PSQ 

How  many  aircraft  arriving  in  the  next  1 5  minutes  are  currently  taxiing? 

332 

SQ 

How  many  bags  are  arriving  aboard  DAL  1223? 

332 

PSQ 

How  many  aircraft  departing  in  the  next  25  minutes  are  still  at  their  gates  and  not  yet  ready  for  pushback? 

332 

PSQ 

How  many  departing  flights  are  holding  short  of  a  runway? 
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